Batch 2 - Class 262 - Math in Genetics II (Genetic Algorithms)
Zoom: send meeting Id and password
Start recording
Preclass Exercise:
(Geoffrey - 143) Three dutchmen and their wives go shopping hogs, and each buys as many hogs as he or she pays for each hog. Each husband spends Rs 63 more than respective wife. Husbands are Hendrick, Elas and Cornelius. Wives are Gurtrun, Katrun and Anna. Hendrick buys 23 more hogs than Katrun, while Elas buys 11 more than Gurtrun. What are the names of each man's wife?
Answer: Pairs are 32/31, 12/9, and 8/1 - we can now see integers that differ by 23 and 11. So its Hendrick-Anna, Elas-Katrun and Cornelius-Gurtrun
Genes are the mechanism of hereditary transmission of features
Example: one gene, two alleles - can have two (color of peas) or more (human blood type) values. Example: multiple genes for a feature (4 for human skin color)
During sexual reproduction, one of the alleles from each parent combine to form the gene for the child.
Notion of dominant and recessive genes - Implies that certain genes may pass from one generation to next without being expressed
Applications in
Understanding how features get transmitted
Understanding of diseases and their transmission/ control
The expression of genes is controlled by production of proteins, and is not always dominant/recessive - it can be intermediate (human pitch of voice) or additive (human skin color)
Alleles do not get propagated at 50% (in case of two alleles) probability in all cases. Genes that can translate to higher survival of an organism start to show dominance in a population (for example, different skin colors in tropical versus polar regions). The key elements of evolution are
Hereditary transmission, providing accumulation of favorable changes
Diversity in genetic mix, providing opportunity for differentiation
Survival advantage
Genetic Algorithms
Computing that leverages the fundamental principles of natural selection
Hereditary transmission - for persistence of "good" variants
Mechanism for variation - for example, mutation, or diversity at starting point
Survival advantage
Consider the problem of monkey's typing random strings. What are the chances of a monkey randomly typing "to be or not to be that is the question" - say the keyboard has 26 characters and a space (total 27 characters)
So chance of getting each character right is 1/27
So chance of getting the full string right (1/27)^39 ~ 1 in 6.6x10^55. So at 1 million phrases a second, its 9x10^42 years! Estimated age of universe is 13 billion years only 🙂
How could we write a program to discover this phrase faster? Lets apply principles of genetics
At each generation we will have a number of "organisms" - "sentences" - of length 39
We will start with a random set
Now we need
Diversity/ Variation - we have some starting diversity, but we will add more by (a) making inheritance from two parents; and (b) a chance of mutation
Hereditary mechanism - so lets say the next generation derives from the previous generation
Selection - we will select probability of reproduction dependent on "fitness" which is just how well a particular organism/sentence matches what we are looking for
Illustrate on simpler phrase "unicorn".
Lets say the first generation is randomly generated, and has "unijorm", "pancake", "aasdfss" and "popcorn"
Calculate fitness for each (just the number of characters that match "unicorn")
Select two members basis probability based on fitness
Cross them to create a new member of population (for now, we will assume the older members continue to operate)
We may need mutation (change one letter randomly at a random rate) to drive more variation. In this example, we will ignore this
For example, if we randomly split and took first three letters of "unijorm" and last four of "popcorn", then we get "unicorn". Normally the process may take more steps, but can lead to members with higher fitness. Take some examples with kids
In each generation, we can add more children, and we can retire the previous generation
This is an example where we know the answer and we want to get to it, so we could just type it. However, it we dont know the answer and the search space is large, how do we get there? What is an algorithm which can do this?
More practical example: Box Car 2D. Note that here we dont know the answer but we can test the answer. The objective is to design a car which can travel the maximum distance (http://boxcar2d.com/about.html#theCar)
Definition of car: A solid object with eight vectors, and two wheels attached at random vertices. So each car can be represented by angle and length of a vector, and then vertex, axle angle and wheel radius, so total of 8*2+2*3=22 variables
How do we apply principles of genetic selection here?
Variation: Start with a population. Crossover at inheritance (of the 22 variable string, pick part from each parent). Mutation (take one of the 22 variables and change it randomly with some probability)
Hereditary: Changes persist - we derive the next generation by cross-over of existing generation
Selection - a fitness function, and probability of survival/ propagation depends on fitness function. Here the fitness function could be the distance that car is able to travel.
What do you expect to happen over several generations?
"Good" features are likely to be chosen for inheritance, and crossover/ mutation provides the mechanism to make them better. Of course, they can become worse but then they have lesser chances of being chosen.
A lot of problems of this nature, where we know how to test for a good answer, but don't know what the good answer is, can be solved using genetic algorithms
Homework:
Consider the maze below
We are trying to find a path through the maze. Each path may be encoded as a sequence of "turns" - what does the path do when it encounters an obstacle (in absence of an obstacle, the path continues in a straight line). The actions might be L,R,B (left, right, back). Let us also say that the path has maximum length of 12 - i.e. if the path doesn't lead to "*" after 12 "turns" and encounters an obstacle, that path is over.
Think about how you would design a genetic algorithm to solve this. What would your coding look like? How will you introduce variation? How would you ensure that "good" features will persist? How will you define "fitness"?
If you know how to code, try to code this! Else try to run your "algorithm" on a random starting string, and see if the fitness scores improve from one generation to another.
Some possible lines of thinking
Each path is an array of 12 turns, each turn being value L,R or B
Start with a random selection of paths (4 paths should be good to start with, as long as they are not similar)
At each generation, cross two paths - perhaps inherit alternate turns from each parent
At some probability, say 10%, introduce a mutation - change a random turn to a random value (can skip initially)
Fitness could be the maximum number of columns a particular path travels before dying
We have heard about "survival of the fittest" - "fittest" what?
Darwinian theory emphasizes on survival of the fittest organism, and hence genes that increase the survival of organism should multiply faster - see example on human skin color above.
Do all organisms survive on their own merit? What about parasites?
What is each gene competing against? Other genes and not necessarily other organisms
The current theory is that as long as a gene has a better survival chance against its competing genes, it can win, even if it doesn't increase the survival of the organism. Every gene for itself. Note that this doesn't contradict Darwinian theory - genes that enhance survival of the organism may in turn also enhance their own survival.
Such genes may not even express themselves, or in some cases might even be detrimental to survival of the organism (as long as other genes can carry that burden, and these genes can ride along with those ones)
"Organism" is just a container for genes - the organism doesn't persist across generations, so can't really evolve
Examples:
Segregation distortion, wherein when an allele gets chosen for reproduction, one has advantage over the other. For example, in certain organisms, the "female" gene is more likely to be chosen as against the "male" gene while reproduction happens. What would this lead to? Higher proportion of females in population - note that this does not have organism level survival advantage. In some cases, this can actually drive a population to extinction.
Note that genes are not "conscious" beings, so the notion of being "selfish" is not in the conscious behavior, but the same mechanism of hereditary transmission, diversity in the mix and survival advantage. Its just operates at a genetic level rather than at an organism level.